The Ehrenfest urn revisited: Playing the game on a realistic fluid model 
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The Ehrenfest urn process, also known as the dogs and fleas model, is realistically simulated by 
molecular dynamics of the Lennard- Jones fluid. The key variable is Az, i.e. the absolute value of 
the difference between the number of particles in one half of the simulation box and in the other 
half. This is a pure-jump stochastic process induced, under coarse graining, by the deterministic 
time evolution of the atomic coordinates. We discuss the Markov hypothesis by analyzing the 
statistical properties of the jumps and of the waiting times between the jumps. In the limit of a 
vanishing integration time-step, the distribution of waiting times becomes closer to an exponential 
and, therefore, the continuous-time jump stochastic process is Markovian. The random variable Az 
behaves as a Markov chain and, in the gas phase, the observed transition probabilities follow the 
predictions of the Ehrenfest theory. 
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I. INTRODUCTION 



A fundamental question in statistical mechanics is the 
reconciliation of the irreversibility of thermodynamics 
with the reversibility of the microscopic equations of mo- 
tion governed by classical mechanics. In 1872 Ludwig 
Boltzmann gave an answer with his H-theorem de- 
scribing the increase in the entropy of an ideal gas as 
an irreversible process. However, the proof of this theo- 
rem contained the Stoflzahlansatz, i.e. the assumption of 
molecular chaos. The result was subject to two main ob- 
jections: Loschmidt's Umkehreinwand (reversibility para- 
dox) and Zermelo's Wiederkehreinwand (recurrence 
paradox) Boltzmann's reply to the two objections 
was not fully understood at the time, but is now con- 
sidered as a corner-stone of statistical mechanics. It is 
summarized in the article that Paul and Tatiana Ehren- 
fest wrote for the German Encyclopedia of Mathematical 
Sciences Subsequently, Boltzmann's approach has 
been reformulated in the language of stochastic processes 

sag. 

Essentially, even in the presence of a deterministic mi- 
croscopic dynamics, the coarse graining of configuration 
space due to the observer's state of knowledge results in 
a stochastic process, where the number of particles in a 
given cell varies at random as a function of time. 

Exactly 100 years ago the Ehrenfests gave a sim- 
ple and convincing interpretation of Boltzmann's ideas 
in term of an urn stochastic process that is a periodic 



Markov chain in their original formulation 



a periodic 
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There are N marbles or balls to be divided into two 
equal parts of a box. In order to fix the ideas, let us 
call P the number of balls in the left part and Q the 
number of balls in the right part. The balls are labeled 
from 1 to N. At each step of the process, an integer 
between 1 and N is selected with probability 1/N and 
the corresponding ball is moved from one part to the 
other. Rather than urns and balls, later variants of the 
model used dogs and fleas jumping from one dog to the 
other, but this does not change the mathematics. In- 
deed, according to Ref. [ll|, the Ehrenfests already had 
something similar to fleas in mind because they used the 
verb hiipfen, meaning hop, that is more appropiate for 
fleas than for marbles. Assuming P > Q, in terms of 
the random variable Az = \P — Q\, the unconditional 
equilibrium probability of a certain value of Az is given 
by 
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The transition probabilities of a decrease, pd(Az— 2 | Az) 
and of an increase, p n (Az + 2 \ Az), of Az are given by 
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Eqs. © completely determine the Ehrenfest urn Markov 
chain. It is possible to define an aperiodic version of 
this process, but both versions share the same station- 
ary distribution (invariant measure) given by Eq. {T]), 
that in the aperiodic case is also the equilibrium distribu- 
tion [l(J, [HJ • As noticed by Kohlrausch and Schrodinger 
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[HI [3 1 Eq. {l} can be regarded as the equilibrium distri- 
bution for a fictitious walker obeying a suitable forward 
Kolmogorov equation: 

> A /' • 1 
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By means of this stochastic process, the Ehrenfests were 
able to present convincing evidence in favour of Boltz- 
mann's approach. In this example, the random vari- 
able Az is the analogous of H and it almost always de- 
creases from any higher value; moreover this is true both 
in the direct and reverse time direction as required by 
Loschmidt's Umkehreinwand, and Az is quasiperiodic as 
required by Zermelo's Wiederkehreinwand [5]. 

But what happens if this game is played with a real 
fluid or, more modestly, with a realistic model [ID, [l6| 
of a fluid? As argued by Boltzmann, in this case the 
deterministic microscopic dynamics induces a stochastic 
process and, again, the number of fluid particles in the 
left side of the box P and in the right side of the box Q 
fluctuate as a function of time. Here, the coarse grain- 
ing is simply due to the division into two equal parts of 
the box that contains the fluid. The Markov hypothesis, 
clearly explained by Penrose [|[, is instrumental in de- 
riving the properties of statistical equilibrium. There is, 
however, a further complication. P, Q, and Az can be 
constant for a certain time before changing their values. 
The waiting times between these jumps are randomly 
distributed as well. The mathematical model for such 
a process is called a continous-time pure-jump stochas- 
tic process 10]. A pure-jump process is Markovian if and 
only if the waiting time between two consecutive jumps is 
exponentially distributed (this distribution may depend 
on the initial non-absorbing state) [lOj. The following 
remark is important. It is possible to define a pure-jump 
process by coupling a Markov chain, such as the Ehrcn- 
fest urn process defined above, with a point process for 
the inter-jump waiting times. If the latter is non expo- 
nential, the pure-jump process is non-Markovian. 

In the present work, we investigate the Markovian 
character of the pure-jump process induced by the simu- 
lation of a Lennard- Jones fluid in a box. 



II. METHODOLOGY 

Systems with N = 500, 1000, 2000 and 100 000 atoms 
interacting with the cut and shifted Lcnnard-Jones pair 
potential 



U = £[0i;(r y )-Oii(rc 
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where r^ is the interatomic distance, were simulated 
using classical molecular dynamics [13, [HI- We em- 
ployed a parallelepiped unit box with side ratios 1:1:1 



when N = 1000 or 2:1:1 in the other cases, and peri- 
odic boundary conditions in all three directions of space. 
For N — 1000, we used also two parallel soft walls in the 
x-direction with periodic boundary conditions in the y, z- 
directions only, i.e. "slab" boundary conditions. The wall 
potential was given by integrating the Lennard-Jones po- 
tential over a semi-infinite wall of atoms distributed with 
a density p w [lj|: 



i 
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where r^ w is the atom-wall distance. We did not put 
walls along all three directions of space to avoid too large 
surface effects with small values of N. We use reduced 
units with a = e = m = = 1, where m is the mass 
of each atom and &b is the Boltzmann constant. This 
defines the time unit as a^/m/e and the temperature 
unit as e/fce- We used the common bulk cutoff value 
r cu t — 2.7 and a wall cutoff r™ ut — v/2/5 corresponding 
to the minimum of the wall potential, so that the cut and 
shifted wall potential is purely repulsive. p w was set to 
1, i.e. slightly below the densities of bec (1.06) and fee 
(1.09) lattices. We chose four points in the phase diagram 
with (p, T) = (0.05, 1.2), (0.7,1.2), (0.05,1.6), (0.7,1.6) 
lying around the critical point, whose accepted value for 
the Lennard-Jones fluid is (0.35,1.35) @,[2l|; see Fig. [2 




FIG. 1: The four simulated points (circles) in the phase di- 
agram of the Lennard-Jones fluid. The liquid-vapour curve 
(solid line) is a Bezier fit to data from Ref. l2ll . The critical 
point corresponds to the maximum of the liquid- vapour curve. 

Production runs of 10 million time steps were done in 
the microcanonic ensemble with the velocity Verlet inte- 
grator [22|, [H| , while equilibration runs were performed 
in the canonic ensemble with an extended system ther- 
mostat [23], H3, HE, HI] . At every time-step we measured 
P as the number of atoms on the left part of the box, 
that is with r x < 0. Thus, as mentioned before, one has 
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FIG. 2: The pure-jump stochastic process Az = \P — Q\ as 
a function of the first 1000 time steps of the first simulation 
run in Table [I] 

Az = \P - Q\ = \2P - N\; see Fig. [2] While a time- 
step At = 0.025 is sufficient for an acceptable energy 
conservation in this kind of system [26[, to get a good 
resolution of the waiting times we started employing a 
smaller At = 0.001; for N = 1000, we obtained <J E /\(E)\ 
in the range from 7.0 • 10~ 6 to 1.1 • 10~ 4 depending on 
p and T . Nevertheless, any time-step we tried down to 
0.0001 was still large enough to observe a few percent 
of jumps in Az greater than 2; the shorter the average 
waiting time, the higher the percentage. There were even 
occasional variations greater than 4 or, for some param- 
eter combinations, 6, 8 or 10. 

A trajectory of 10 million time-steps with N = 1000 
took about 20 hours at p = 0.05 and about 80 hours at 
p = 0.7 on a 2.4 GHz Intel Pentium IV processor with 
our own C++ code using Verlet neighbour lists. With 
N = 100 000, the lower density lasted 17.5 hours on 64 
IBM Power4+ processors at 1.7 GHz, and the higher den- 
sity almost 9 days on 64 AMD Opteron 270 processors at 
2.0 GHz, with a Fortran code using domain decomposi- 
tion and linked cell lists (27j. Trajectories of this length 
are the main difference with respect to the pioneering 
simulations of 40 years ago, when for N = 864 atoms 
and p ~ 0.8 one time-step took 45 seconds on a CDC- 
3600 [HI], while trajectories consisted typically of 1200 
time-steps 

[iil- 
iii. RESULTS 

A. Analysis of jumps 

In this section, we study the random variable Az. We 
compare simulation results with the Ehrcnfcst theory to 
see whether Az obeys the Markov-chain equations (P [3} • 

In Fig. [3J the empirical estimate for p cq (Az) is plot- 
ted and compared with Eq. ©. There is visibly a good 
agreement between the quantitative predition of Eq. @ 



and the empirical histogram for the gas phase, and this 
agreement is slightly better for the higher temperature. 
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FIG. 3: Histograms of the values of Az from the runs of the 
N — 1000 systems without walls. The theoretical line given 
by Eq. ([2]) matches the gas states. 

In Fig. [H we report results on the one-step transi- 
tion probabilities. The Ehrenfest prediction is given by 
Eqs. ([3]). Again, in the gas phase of the Lennard- Jones 
fluid there is agreement between the sampled transition 
probabilities and the Ehrcnfcst theory Even if linear in 
Az, the sampled transition probabilities for the liquid 
phase deviate from Eqs. 
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FIG. 4: One-step transition probabilities pa(Az — 2 | Az) and 
p u (Az+2 | Az) for p = 0.7, T = 1.2 (liquid) and p = 0.05, T = 
1.6 (gas), N — 1000 without walls. The theoretical lines 
1/2 ± Az/(2N) H match the gas state. 

Sampled two-steps transition probabilites are plotted 
in Fig.[5j If the process is a Markov chain, these probabil- 
ities must be the product of two one-step transition prob- 
abilities. This property appears satisfied both for the gas 
and for the liquid. Moreover, for the gas, the sampled 
two-steps probabilities follow the Ehrenfest quantitative 
prediction given by Eqs. ([5]). 
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TABLE I: For each integration time-step At, number of atoms 
N, density p and temperature T (a "w" before the N value 
indicates a system with walls in the x- direct ion ) , this table 
gives the number of observed waiting times n, the values of 



the Anderson-Darling statistics A 



the average waiting 



50 60 



time (t), and the standard deviation of waiting times o>. 
Reduced units as defined in Sec. [II] are used throughout, with 
times divided by 0.001. The standard error on (r) is around 
0.02 for p = 0.05 and 0.006 for p = 0.70. The standard error 
on a T is around 0.02 for p = 0.05 and 0.005 for p = 0.70. 
Only significant digits are given in the table. The last digit 
of (r) and a T is of the same order of magnitude as a T /^/n. 
See text for further explanations. 



FIG. 5: Two-steps transition probabilities p c id(Az — 4| Az), 
p du (Az\Az), p ud (Az|A;z) and p uu (Az + 4 1 Az) for p = 
0.05, T = 1.6 (gas, top) and p = 0.7, T = 1.2 (liquid, bot- 
tom), N = 1000 without walls. The theoretical lines are the 
product of the two corresponding one-step transition proba- 
bilities, e.g. puu{Az + 4 | Az) = p u (Az + 4 | Az + 2)p u {Az + 
2 | Az). We use the theoretical one-step transition probabili- 
ties for the gas and the observed ones for the liquid. 



Even if, rigorously speaking, we have not shown that, 
for all n, the n-step transition probabilities are the prod- 
uct of n one-step transition probabilities (see Ref. [28| for 
processes obeying the semigroup property that are not 
Markov chains), at least we can claim that we have not 
been able to falsify the Markov-chain hypothesis for Az 
based on our statistics in all the investigated cases. Re- 
markably, the pure Ehrenfest Markov-chain theory is a 
good approximation for the gas, but does not work for 
the liquid. 



B. Analysis of waiting times 

The results of the simulations regarding the wait- 
ing time distribution are summarized in Table |TJ The 
Anderson-Darling statistics A 2 reported in the sixth col- 



umn results from (29| 

A 2 = \-Y,^-^-[\nnr n+ i-i) (7) 

+ ln(l-*(r i ))]-n| (l + ^) , 

where vE'(t) denotes the survival function, a short name 
for the complementary cumulative distribution function, 
i.e. the probability that waiting times are larger than r. 
In Eq. ([7]) the waiting times are sorted: t\ < t 2 < . . . < 
t„. The limiting value at 1% significance for accepting 
the null hypothesis of exponentially distributed waiting 
times is 1.957. Therefore, the null hypothesis can be re- 
jected in all cases with At > 0.0001. The average waiting 
time (r) and the standard deviation a T of the observed 
distribution, reported in columns seven and eight, must 
coincide for an exponential distribution. Even if their 
values are close, with the given statistics they cannot be 
considered equal. Fig. [5] further illustrates this point; 
there, the closest case to an exponential for N = 1000 is 
presented, p = 0.05, T — 1.2 without walls, as well as 
the most distant case, p — 0.7, T = 1.6 without walls. 
In both cases the points are the observed survival func- 
tion, ^E'(t), and the dashed line is the exponential fit. A 
deviation from the exponential distribution is evident at 
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FIG. 6: N = 1000, no walls. Comparison between the ob- 
served survival functions and the theoretical exponential sur- 
vival functions (dashed lines) with the corresponding average 
waiting time (r), for the closest case (squares) and the most 
distant case (circles). The theoretical exponential survival 
function of the system with TV = 2000 and At = 10" 5 is 
shown for reference (continuous line). 



first sight. It is important to remark that this is a one- 
parameter fit, since the average waiting time (r) is suffi- 
cient to fully determine the exponential distribution, with 
survival function ^ exp (T) = exp(— t/(t}), corresponding 
to a given data set. In other words, the mere fact that 
in log-linear scale the survival function is approximately 
a straight line is not sufficient to conclude that the ob- 
served distribution is exponential. In the four cases stud- 
ied here, the presence of walls does not significantly affect 
the results. 

However, the agreement improves if the integration 
time-step At is reduced from 0.001 to 0.0002: for p — 
0.05, T = 1.2 in the N = 2000 system, A 2 drops from 
2061 to 29.84 and (t) from 16.29 to 15.72; the lower 
value of (t) corresponds better to the observed survival 
function. The data change very little with respect to 
At = 0.001 and are not shown in Fig. [6] to avoid clut- 
tering. This indicates that the discrepancy is due to 
the finite integration time-step and can be controlled 
through the latter. The hypothesis is confirmed reduc- 
ing At further: for At = 10~ 4 , A 2 = 3.78, and finally 
for At = 10~ 5 , A 2 = 0.686 < 1.957, i.e. the required 
threshold. The same trend is evident in the N = 100 000 
system, see Fig. though even smaller time steps would 
be necessary to reach the threshold because the average 
waiting time decreasess inversely proportionally to the 
interface area. 

As suggested by intuition, the average waiting time 
decreases with higher density and temperature, but also 
whith a larger interface area S between the two parts of 
the box. Actually, the product {t)S is a constant for a 
given density and temperature. The survival functions 
of systems with different sizes overlap if (r) is multiplied 
with the interface area. This is shown in Fig. [5J where it 
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FIG. 7: Reducing the integration time-step At improves the 
agreement between the observed survival function and an ex- 
ponential function with a time constant equal to the average 
waiting time; system with N = 100 000, p = 0.05, T = 1.2. 

is also clear that there are no changes due to the finite 
size of the system for N > 1000 (after correcting for the 
interface area, the survival function of N = 500 is slightly 
displaced from all the others). 
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FIG. 8: Survival functions for p = 0.7, T1.6) and different 
system sizes. They overlap if (r) is multiplied with the ratio 
of the interface area to the interface area of the systems with 
N — 1000 or 2000 (that are equal because the former is the 
only one with a cubic unit box, while all the others have side 
ratios of 2:1:1). A finite-size effect is noticeable only on the 
smallest system. 

A better strategy than reducing the time-step is to in- 
terpolate the time of the barrier crossing within a conven- 
tional time-step: this way the waiting times can be deter- 
mined with floating-point precision rather than as inte- 
ger multiples of At, there will not be changes in Az > 2, 
and it is likely that good results can be obtained with 
the maximum At compatible with energy conservation. 
Though we believe that the major effect of a finite At is 
through sampling, because without interpolation waiting 
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times are systematically overestimated by a fraction of 
At, another effect is through the approximation of the 
true canonical dynamics. Indeed, with a soft potential 
this approximation can be reduced only in the limit of 
At — ► 0, but it can be avoided completely in a system of 
hard spheres. Work on both lines, interpolation of the 
waiting times and hard spheres, is in progress. 

IV. CONCLUSIONS 

In summary, we have studied the Ehrenfest urn whith 
a realistic model of condensed matter, the Lennard-Jones 
fluid. The Ehrenfest urn has been defined by Mark Kac 
the best model ever envisaged in statistical mechanics 
[30| . yet it has also been criticized as a marvellous ex- 
ercise too far removed from reality [TTj | . In the 100th 
anniversary of the Ehrcnfcsts' original paper, we have 
shown that this criticism is unjustified, since computer 
"experiments" allow to follow the motion of molecules 
and to count how many are on one side of a box or the 
other at a given time. We have studied the behaviour 
of the pure-jump stochastic process Az = \P — Q\ in- 
duced by the deterministic dynamics under coarse grain- 
ing, where P is the number of fluid particles on the left- 
hand side of the simulation box and Q that on the right- 
hand side. We have performed simulations with peri- 
odic boundary conditions and with walls in one direc- 
tion, finding that the presence of walls does not affect 



the results. We have found that in the gas phase the ob- 
served transition probabilities follow the predictions of 
the Ehrenfest theory, and that the waiting time distri- 
bution between successive variations of Az, though not 
strictly exponential, becomes closer to an exponential re- 
ducing the integration time-step; therefore, in the limit 
of a vanishing time-step, we found that the correspond- 
ing pure-jump process is Markovian. To our knowledge, 
this is the first characterization of a pure-jump stochas- 
tic process induced by a deterministic dynamics under 
coarse-graining. In the future, we plan to further study 
the stochastic process presented here interpolating the 
waiting times to higher precision, simulating systems of 
hard spheres to avoid approximations in the dynamics 
due to a finite integration time-step, and investigating 
the pure-jump process in a coarse-grained configuration 
space as required by the theory developed by Boltzmann. 
Our results so far corroborate the Markovian hypothesis 
lying at the foundation of statistical mechanics [||. 
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